14 research outputs found
Empirical Methodology for Crowdsourcing Ground Truth
The process of gathering ground truth data through human annotation is a
major bottleneck in the use of information extraction methods for populating
the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the
attempt to solve the issues related to volume of data and lack of annotators.
Typically these practices use inter-annotator agreement as a measure of
quality. However, in many domains, such as event detection, there is ambiguity
in the data, as well as a multitude of perspectives of the information
examples. We present an empirically derived methodology for efficiently
gathering of ground truth data in a diverse set of use cases covering a variety
of domains and annotation tasks. Central to our approach is the use of
CrowdTruth metrics that capture inter-annotator disagreement. We show that
measuring disagreement is essential for acquiring a high quality ground truth.
We achieve this by comparing the quality of the data aggregated with CrowdTruth
metrics with majority vote, over a set of diverse crowdsourcing tasks: Medical
Relation Extraction, Twitter Event Identification, News Event Extraction and
Sound Interpretation. We also show that an increased number of crowd workers
leads to growth and stabilization in the quality of annotations, going against
the usual practice of employing a small number of annotators.Comment: in publication at the Semantic Web Journa
ENABLING MEDICAL EXPERT CRITIQUING USING A BDI APPROACH
Expert critiquing systems were introduced to assist physicians in decision making, without forcing them to comply to a gold standard of care. Critiquing systems do this by providing critique on a physician’s decisions, rather than telling him/her exactly what to do. In order to perform this task, a critiquing system must have knowledge of the diagnosis and the treatment processes, and must be able to link the actions preformed by a physician to this knowledge. The development of formal languages for describing medical guidelines (protocols) and the nationwide introduction of electronic patient records (EPR) in the Netherlands, facilitates the development of a new generation of medical critiquing systems. Essential to the success of the new generation critiquing systems is the ability to match the actions prescribed in a medical guideline to the physician’s actions reported in the EPR. Some authors have claimed that such a matching process is infeasible. This paper will show, however, that a BDI (beliefs, desires and intentions) approach enables a highly successful matching process thereby enabling expert critiquing based on an EPR.
Evaluating Medical Lexical Simplification: Rule-Based vs. BERT
Lexical simplification (LS) can decrease the communication gap between medical experts and laypeople by replacing medical terms with layperson counterparts. In this paper, we present: 1) a rule-based approach to LS using a consumer health vocabulary, and 2) an unsupervised approach using BERT to generate word candidates. Human evaluation shows that the unsupervised model performed better for simplicity and grammaticality, while the rule-based method was better at meaning preservation
Domain-Independent Quality Measures for Crowd Truth Disagreement
Abstract. Using crowdsourcing platforms such as CrowdFlower and Amazon Mechanical Turk for gathering human annotation data has become now a mainstream process. Such crowd involvement can reduce the time needed for solving an annotation task and with the large number of annotators can be a valuable source of annotation diversity. In order to harness this diversity across domains it is critical to establish a common ground for quality assessment of the results. In this paper we report our experiences for optimizing and adapting crowdsourcing microtasks across domains considering three aspects: (1) the micro-task template, (2) the quality measurements for the workers judgments and (3) the overall annotation work ow. We performed experiments in two domains, i.e. events extraction (MRP project) and medical relations extraction (Crowd-Watson project). The results con rm our main hypothesis that some aspects of the evaluation metrics can be de ned in a domainindependent way for micro-tasks that assess the parameters to harness the diversity of annotations and the useful disagreement between workers. This paper focuses speci cally on the parameters relevant for the 'event extraction ' ground-truth data collection and demonstrates their reusability from the medical domain
Enabling protocol-based medical critiquing
Abstract. This paper investigates the combination of expert critiquing systems and formal medical protocols. Medical protocols might serve as a suitable basis for an expert critiquing system because of the ongoing acceptance of medical protocols and the rise of both evidence-based practice and evidence-based protocols.